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Abstract. Abstract interpretation techniques can be made more precise by 
distinguishing paths inside loops, at the expense of possibly exponential com- 
plexity. SMT-solving techniques and sparse representations of paths and sets 
of paths avoid this pitfall. 

We improve previously proposed techniques for guided static analysis and 
the generation of disjunctive invariants by combining them with techniques for 
succinct representations of paths and symbolic representations for transitions 
based on static single assignment. 

Because of the non-monotonicity of the results of abstract interpretation 
with widening operators, it is difficult to conclude that some abstraction is 
more precise than another based on theoretical local precision results. We thus 
conducted extensive comparisons between our new techniques and previous 
ones, on a variety of open-source packages. 



1. Introduction 

Static analysis by abstract interpretation is a fully automatic program analysis 
method. When applied to imperative programs, it computes an inductive invariant 
mapping eachprogram location (or a subset thereof) to a set of states represented 
symbolically |8| . For instance, if we are only interested in scalar numerical program 
variables, such a set may be a convex polyhedron (the set of solutions of a system 
of linear inequalities) (ol, [itI [H, I3| . 

In such an analysis, information may flow forward or backward; forward pro- 
gram analysis computes super-sets of the states reachable from the initialization 
of the program, backward program analysis computes super-sets of the states co- 
reachable from some property of interest (for instance, the violation of an assertion). 
In forward analysis, control-flow joins correspond to convex hulls if using convex 
polyhedra (more generally, they correspond to least upper bounds in a lattice); in 
backward analysis, it is control-flow splits that correspond to convex hulls. 

It is a known limitation of program analysis by abstract interpretation that this 
convex hull, or more generally, least upper bound operation, may introduce states 
that cannot occur in the real program: for instance, the convex hull of the inter- 
vals [—2,-1] and [1,2] is [—2,2], strictly larger than the union of the two. Such 
introduction may prevent proving desired program properties, for instance ^ 0. 
The alternative is to keep the union symbolic (e.g. compute using [—2, —1] U [1, 2]) 
and thus compute in the disjunctive completion of the lattice, but the number of 
terms in the union may grow exponentially with the number of successive tests in 

This work was partially funded by ANR project "ASOPT" . 
Julien Henry is a graduate student at Universite Joseph Fourier, VERIMAG laboratory. VER- 
IMAG is a joint laboratory of Universite Joseph Fourier, CNRS and Grenoble-INP. 
David Monniaux is researcher at CNRS, VERIMAG laboratory. 
Matthieu Moy is assistant professor at Grenoble-INP, VERIMAG laboratory. 



2 



JULIEN HENRY, DAVID MONNIAUX, AND MATTHIEU MOY 



the program to analyze, not to mention difficulties for designiiig suitable widening 
operators for enforcing the convergence of fixpoint iterations [H, 0, 01 • The expo- 
nential growth of the number of terms in the union may be controlled by heuristics 
that judiciously apply least upper bound operations, as in the trace partitioning 
domain [29| implemented in the Astree analyzer 0, • 

Assuming we are interested in a loop- free program fragment, the above approach 
of keeping symbolic unions gives the same results as performing the analysis sepa- 
rately over every path in the fragment. A recent method for finding disjunctive loop 
invariants [lij is based on this idea: each path inside the loop body is considered 
separately. Two recent proposals use SMT-solving [23| as a decision procedure for 
the satisfiability of first-order arithmetic formulas in order to enumerate only paths 
that are needed for the progress of the analysis [l3|, |28| . They can equivalently be 
seen as analyses over a multigraph of transitions between some distinguished control 
nodes. This multigraph has an exponential number of edges, but is never explicitly 
represented in memory; instead, this graph is implicitly or succinctly represented: 
its edges are enumerated as needed as solutions to SMT problems. 

An additional claim in favor of the methods that distinguish paths inside the 
loop body [l^ Hs^] is that they tend to generate better invariants than methods 
that do not, by behaving better with respect to the widening operators ,8] used for 
enforcing convergence when searching for loop invariants by Kleene iterations. A re- 
lated technique, guided static analysis Ts'], computes successive loop invariants for 
increasing subsets of the transitions taken into account, until all transitions are con- 
sidered; again, the claim is that this approach avoids some gross over-approximation 
introduced by widenings. 

All these methods improve the precision of the analysis by keeping the same 
abstract domain (say, convex polyhedra) but changing the operations applied and 
their ordering. An alternative is to change the abstract domain (e.g. octagons, 
convex polyhedra 26] ) , or the widening operator 0, [3l ■ 

This article makes the following contributions: 

(1) We recast the guided static analysis technique from flB'l on the expanded 
multigraph from [28| | , considering entire paths instead of individual transi- 
tions, using SMT queries and binary decision diagrams (See 

(2) We improve the technique for obtaining disjunctive invariants from 16| by 
replacing the explicit exhaustive enumeration of paths by a sequence of 
SMT queries (See 

(3) We implemented these techniques, in addition to "classical" iterations and 
the original guided static analysis, inside a prototype static analyzer. This 
tool uses the LLVM bitcode format 2J,|25| as input, which can be produced 
by compilation from C, CH — h and Fortran, enabling it to be run on many 
real-life programs. It uses the APRON library [22], which supports a variety 
of abstract domains for numerical variables, from which we can choose with 
minimal changes to our analyzer. 

(4) We conducted extensive experiments with this tool, on real-life programs. 



2. Bases 

2.1. Static Analysis by Abstract Interpretation. Let X be the set of possible 
states of the program variables; for instance, if the program has 3 unbounded 
integer variables, then X — Z,^. The set J'(X) of subsets of X, partially ordered 
by inclusion, is the concrete domain. An abstract domain is a set equipped 
with a partial order C (the associated strict order being c); for instance, it can 
be the domain of convex polyhedra in ordered by geometric inclusion. The 
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concrete and abstract domains are connected by a monotone concretization function 
7 : {X^, C) (y(^), C): an element x» G represents a set 7(x»). 

We also assume a join operator U : x X^, with infix notation; in 

practice, it is generally a least upper bound operation, but we only need it to 
satisfy ^{x'^) U 7(2;'') C 7(x' U y") for all x", y". 

Classically, one considers the control-flow graph of the program, with edges la- 
beled with concrete transition relations (e.g. x' = x+l for an instruction x = x+1 ;), 
and attaches an abstract element to each control point. A concrete transition rela- 
tion T (- X X X is replaced by an abstract forward abstract transformer : X^ — )■ 
X", such that Vx"* G X'^,x,x' & X, x £ 7(0;'') A {x,x') G r =^ x' e 7 0T»(a;''). 
It is easy to see that if to any control point p G P we attach an abstract element 
xj, such that (i) for any p, j{xj,) includes all initial states possible at control node 
p (ii) for any p,p', Tpp,{x^) C x^,, noting Tp ^/ the transition from p to p' , then 
(7(a;J,))pgp form an inductive invariant: by induction, when the control point is p, 
the program state always lies in j{xj,). 

Kleene iterations compute such an inductive invariant as the stationary limit, 
if it exists, of the following system: for each p, initialize such that 7(2;^) is a 

superset of the initial states at point p; then iterate the following: if p, (xj,) % x^, , 
replace x^, by x^, U t^^, (xj,). Such a stationary hmit is bound to exist if X^ has 
no infinite ascending chain oi IZ 02 C . . . ; this condition is however not met by 
domains such as intervals or convex polyhedra. 

Widening- accelerated Kleene iterations proceed by replacing x^, U 7'pp/(xJ,) by 
■^p' ^i^p'^'^p p' i^p)) where V is a widening operator: for all x", y", j{y^) Q 7(x'* V y"), 
and any sequence u}, Ug, . . . of the form = u^^V vf^, where vf^ is another se- 

quence, become stationary. The stationary limit (x*)pgp, defines an inductive 
invariant (7(x^))pgp. Note that this invariant is not, in general, the least one ex- 
pressible in the abstract domain, and may depend on the iteration ordering (the 
successive choices p,p')- 

Once an inductive invariant 7((x^)pgp) has been obtained, one can attempt 
decreasing or narrowing iterations to reduce it. In their simplest form, this just 
means running the following operation until a fixpoint or a maximal number of 

iterations are reached: for any p', replace x', by x^, n (^UpeP "^p p' (^p)) ■ The result 
also defines an inductive invariant. These decreasing iterations are indispensable 
to recover properties from guards (tests) in the program in most iteration settings; 
unfortunately, certain loops, particularly those involving identity (no-operation) 
transitions, may foil them: the iterations immediately reach a fixpoint and do not 
decrease further (see example in ij2.3p . Sections 12.41 and 12.51 describe techniques 
that work around this problem. 

2.2. SMT-soIving. Boolean satisfiability (SAT) is the canonical NP-complete prob- 
lem: given a propositional formula (e.g. (a V ^b) A (-la V 6 V ^c)), decide whether 
it is satisfiable — and, if so, output a satisfying assignment. Despite an expo- 
nential worst-case complexity, the DPLL algorithm [23, 6] solves many useful SAT 
problems in practice. 

SAT was extended to satisfiability modulo theory (SMT) : in addition to proposi- 
tional literals, SMT formulas admit atoms from a theory. For instance, the theories 
of linear integer arithmetic (LIA) and linear real arithmetic (LRA) have atoms of 
the form oixi -I- • ■ ■ + anXn ix C where ai, . . . , a„, C are integer constants, xi, . . . , x„ 
are variables (interpreted over Z for LIA and R or Q for LRA), and M is a com- 
parison operator =,7^, <,<,>,>. Satisfiability for LIA and LRA is NP-complete, 
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yet tools based on DPLL(T) approach [23|, |6| solve many useful SMT problems in 
practice. All these tools provide a satisfying assignment if the problem is satisfiable. 

2.3. A Simple, Motivating Example. Consider the following program, adapted 
from [2I], where input(a, b) stands for a nondeterministic input in [a, b] (the control- 
flow graph on the right depicts the loop body, s is the start node and e the end 
node): 



1 void rate.limiter () { 

2 int x_old = 0; 

3 while (1) { 

4 Int X = 

5 If (x > 

6 If (x < 

7 x.old = 

8 } } 



input(-100000, 
x.old + 10) X = 
x.old-10) X = 
X ; 



100000) 
X.old +1 
x.old -10 



This program implements a construct commonly found in control programs (in 
e.g. automotive or avionics): a rate or slope limiter. 

The expected inductive invariant is x.old G [—100000,100000], but classical 
abstract interpretation using intervals (or octagons or polyhedra) finds x.old £ 
(—00, +00) [lO|. Let us briefly see why. 

Widening iterations converge to x.old G (—00, +00); let us now see why decreas- 
ing iterations fail to recover the desired invariant. The x > x.old+10 test at line 6, 
if taken, yields x.old G (—00, 99990); followed by x = x.old+10, we obtain x G (—00, 
100000), and the same after union with the no-operation "else" branch. Line 7 
yields x G (—00, -l-oo). 

We could use "widening up to" or "widening with thresholds" , propagating the 
"magic values" ±100000 associated to x into x.old, but these syntactic approaches 
cannot directly cope with programs for which x G 100000, -1-100000] is itself 
obtained by analysis. The guided static analysis of [15] does not perform better, 
and also obtains x.old G (— cxd, -I-oo). 

In contrast, let us distinguish all four possible execution paths through the tests 
at lines 6 and 7. The path through both "else" branches is infeasible; the program 
is thus equivalent to a program with 3 paths: 

1 void rate.limiterO { 



Int x.old 
while (1 
Int X = 
If (X > 
else If 
else X 



0; 



) { 



input(-100000, 100000); 
x.old + 10) x.old = X.old + 10; 
(x < x.old-10) x.old = x.old-10; 
old = x ; 




2 
3 
4 
5 
6 
7 

8 } } 

Classical interval analysis on this program yields x.old G [—100000, 100000]. We 
have transformed the program, manually pruning out infeasible paths; yet in general 
the resulting program could be exponentially larger than the first, even though not 
all feasible paths are needed to com put e the invariant. 

Following recent suggestions [13, HI], we avoid this space explosion by keeping 
the second program implicit while simulating its analysis. This means we work 
on an implicitly represented transition multigraph ; it is succinctly represented 
by the transition graph of the firstprogram. Our first contribution (^ is to 
recast the "guided analysis" from [l5[ on such a succinct representation of the 
paths in lieu of the individual transitions. A similar explosion occurs in disjunctive 
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1 int X = 0, y = 0; 

2 while (1 ) { 

3 if (X <= 50) y++; 

4 else y — ; 

5 if (y < 0) break; 

6 X + + ; 
7} 




y<x A y <lQ2-x A y>Q. 



Figure 1. Example program and its invariant: the piecewise lin- 
ear, solid line is the strongest invariant, the grayed polyhedron is 
its convex hull. 



invariant generation, following ;16i] : our second contribution applies our implicit 
representation to their method. 

2.4. Guided Static Analysis. Guided static analysis was proposed by [l^ as an 
improvement over classical upward Kleene iterations with widening. Consider the 
program in Fig. [U taken from |15|] . 



Classical iterations on the domain of convex polyhedra 9, 2] or octagons [26[ start 
with a; = A a; = 0, then continue with x = y f\Q < x < 1. The widening operator 
extrapolates from these two iterations and yields x = y /\ x > Q. From there, the 
"else" branch at line 4 may be taken; with further widening, < y < a; is obtained 
as a loop invariant, and thus the computed loop postcondition is x > A y = 0. 
Yet the strongest invariant is (0 < a: < 51 A y = x) V (51 < a; < 102 t\x + y — 102), 
and its convex hull, a convex polyhedron (Fig. [T]). 

Intuitively, this disappointing result is obtained because widening extrapolates 
from the first iterations of the loop, but the loop has two different phases {x < 50 
and X > 50) with different behaviors, thus the extrapolation from the first phase is 
not valid for the second. 

Gopan and Reps' idea is to analyze the first phase of the loop with a widening 
and narrowing sequence, and thus obtain 0<a;<50Ay = a;, and then analyze the 
second phase, finally obtaining invariant p.4p : each phase is identified by the tests 
taken or not taken. 

The analysis starts by identifying the tests taken and not taken during the first 
iteration of the loop, starting in the loop initialization. The branches not taken are 
pruned from the loop body, yielding: 

while(l) { 

If (x <= 50) y++; 

else break; /* not taken in phase 1 */ 
if (y < 0) break ; 

X + +; 

} 

Analyzing this loop using widening and narrowing on convex polyhedra or oc- 
tagons yields the loop invariant 0<a:<51Ay = a;. Now, the transition at 
line 4 becomes feasible; and we analyze the full loop, starting iterations from 
0<a;<5lAy = a;, and obtain invariant (|2.4p in Fig[T] 

More generally, this analysis method considers an ascending sequence of subsets 
of the transitions in the loop body ; for each subset, an inductive invariant is 
computed for the program restricted to it. The starting subset consists in the 
transitions reachable in one step from the loop initialization. If for a given subset S 
in the sequence, no transitions outside S are reachable from the inductive invariant 
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attached to S, then iterations stop; otherwise, add these transitions to S and iterate 
more. Termination ensues from the finiteness of the control-flow graph. 

2.5. Path-focusing. Monniaux & Gonnord's path-focusing j;28j technique distin- 
guishes the different paths in the program in order to avoid loss of precision due 
to merge operations. Since the number of paths may be exponential, the technique 
keeps them implicit and computes them when needed using SMT-solving. The (ac- 
celerated) Kleene iterations ft i2.ip arc computed over a reduced multigraph instead 
of the classical transition graph. 

Let P be the set of control points in the transition graph, Pw ^ P the set of 
widening points such that removing the points in Pw gives an acyclic graph. One 
can choose a set Pr such that Pw Q Pr ^ P- 

The set of paths is kept implicit by an SMT formula p expressing the semantics 
of the program, assuming that the transition semantics can be expressed within a 
decidable theory. For an easy construction of p, we also assume that the program 
is expressed in SSA form, meaning that each variable is only assigned once in the 
transition graph. This is not a restriction, since there exists standard algorithms 
that transform a program into an SSA format. 

This formula contains Boolean reachability predicates hi for each control points 
Pi ^ Pri and bf for each pi G Pr, so that a path pi^ ^ pi^ ^ ■ ■ ■ ^ pi^ 
between two points Pi^,Pi„ € Pr can easily be expressed as the conjunction bf_^ A 
t\2<k<n bik ^ bf^. The Boolean bf is true when the path starts at point pi, whereas 
bf is true when the path arrives at pi. In other words, we split the points in 
into a source point, with only outgoing transitions, and a destination point, with 
only incoming transitions, so that the resulting graph is acyclic and there are no 
paths going through control points in Pr. 

In order to find focus paths, we solve an SMT formula which is satisfiable when 
there exists a path starting at a point pi G Pr in a state included in the current 
invariant candidate Xi, and arriving at a point pj G Pr in a state outside Xj. In 
this case, we construct this path using the model and update Xj . When pi = pj , 
meaning that the path is actually a self- loop, we can apply a widening/narrowing 
sequence, or even compute the transitive closure of the loop (or an approximation 



thereof, or its application to Xi) using abstract acceleration [14 1. 

We assume that we can encode the concrete semantics of the program into the 
SMT formula, or at least an abstraction thereof at least as precise as the one 
applied by the abstract interpreter (in simple terms: we want to avoid the case 
where the SMT solver exhibits a possible path, but the static analyzer realizes that 
this path is infeasible; this would lead to nontermination, because the SMT solver 
would exhibit the same path on the next iteration). A workaround would be to 
apply satisfiability modulo path programs [l9j : from each path ruled infeasible by 
abstract interpretation, extract a blocking clause for the SAT solver underlying the 
SMT-solver. 



3. Guided Analysis over the Paths 



Guided static analysis, as proposed by [15|, applies to the transition graph of the 



program. We now present a new technique applying this analysis on the implicit 
multigraph from [28| . thus avoiding control flow merges with unfeasible paths. In 
this section, we use the same notations as iJ2.5l 

The combination of these two techniques aims at first discovering a precise in- 
ductive invariant for a subset of paths between two points in Pr, by the mean of 
ascending and narrowing iterations. When an inductive invariant has been found, 
we add new feasible paths to the subset and compute an inductive invariant for this 
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new subset, starting with the resuhs from the previous analysis. In other words, 
our technique considers an ascending sequence of subsets of the paths between two 
points in Pfj. We iterate the operations until the whole program (i.e all the feasible 
paths) has been considered. The result will then be an inductive invariant of the 
entire program. 

The ascending iteration applies path-focusing |28j to a subset of the multigraph. 
As (l5l |. we do some narrowing, to recover precision lost by widening, before com- 
puting and taking into account new feasible paths. Thus, our technique combines 
the advantages of Guided Static Analysis and Path-focusing. 

Algorithm [T] performs Guided static analysis on the implicitly represented multi- 
graph. Ip denotes a set of initial states at program point p (thus for most p). 
The current working subset of paths, noted P and initially empty, is stored using 
a compact representation, such as binary decision diagrams. We also maintain two 
sets of control points: 

• A' : points in that may be the starting points of new feasible paths. 

• A : points in Pr on which we apply the ascending iterations. When the 
abstract value of a control point p is updated, p is added to both A and A' . 



Algorithm 1 Guided static analysis on implicit multigraph 
1; A' ^ {p\Pr/Ip ^ 0} 

2: A^% 

3: P <— // Paths in the current subset 
4: for all Pi G Pr do 

5: Xi -It- Ip^ 

6: end for 
7: while A' ^9 do 
8: v^fhile A' 7^ do 
9: Select Pi e A' 

10: A'^A'\ {pi} 

11: ComputeNewPaths(pi) // Update A, A' and P 

12: end while 

13: / / ascending iterations on P 

14: while A ^ do 

15: Select Pi & A 

16: ^ ^ ^ \ {pi} 

17: PathFocusing(pi) / / Update A and A' 

18: end while 

19: Narrow 

20: end while 

21: return {Xi, i e Pr} 



We distinguish three phases in the main loop of the analysis: 

(1) We start finding a new relevant subset P of the graph. Either the previous 
iteration or the initialization led us to a state where there are no more paths 
in the previous subset P, starting at pi, that make the abstract values of 
the successors grow (otherwise, the SMT solver would not have answered 
"unsai"). Narrowing iterations preserve this property. However, there may 
exist such paths in the entire multigraph, that are not in P. This phase 
computes these paths and adds them to the subset. This phase is described 
in 13.21 and corresponds to lines in |S] to [T^] in Algorithm [1] 

(2) Given a new subset P, we search for paths starting at point pi G Pr, such 
that these paths are in P, i.e are included in the working subgraph. Each 
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time we find a path, we update the abstract value of the destination point 
of the path. This is the phase explained in l3.1l and corresponds to lines [2] 
to [18] in Algorithm [H 
(3) We perform narrowing iterations the usual way (line[T9]in algorithm[T]) and 
reiterate from step 1 unless there are no more points to explore, i.e. A' — 0. 
The order of steps is important: narrowing has to be performed before adding 
new paths, or spurious new paths would be added to P. Starting with the addition 
of new paths avoids doing the ascending iterations on an empty graph. 

3.1. Ascending Iterations by Path- focusing. For computing an inductive in- 
variant over a subgraph, we use the Path-focusing algorithm from with special 
treatment for self loops (line [17] in algorithm [T|) . 

In order to find which path to focus on, we construct an SMT formula f{pi), 
whose model when satisfiable is a path that starts in pi, goes to a successor pj G Pr 
of Pi , such that the image of Xi by the path transformation is not included in the 
current Xj . Intuitively, such a path makes the abstract value Xj grow, and thus is 
an interesting path to focus on. We loop until the formula becomes unsatisfiable, 
meaning that the analysis of pi is finished. 

If we note Succ{i) the set of indices j such that pj G Pr is a successor of pi in 
the expanded multigraph, and Xi the abstract value associated to pi : 

f{p,)^pAbtA /\ -6jAA,A V {b^A^X,) 

i^PR jeSucc(i) 



The difference with 28[ is that we do not work on the entire transition graph but on 
a subset of it. Therefore we conjoin the formula f{pi) with the actual set of working 
paths, noted P, expressed as a Boolean formula, where the Boolean variables are the 
reachability predicates of the control points. We can easily construct this formula 
from the binary decision diagram using dynamic programming, and avoiding an 
exponentially sized formula. In other words, we force the SMT solver to give us 
a path included in P. Each time the invariant candidate of a point pj has been 
updated, pj is inserted into A' since it may be the start of a new feasible paths. 

3.2. Adding New Paths. Our technique computes the fixpoint iterations on an 
ascending sequence of subgraphs, until the complete graph is reached. When the 
analysis of a subgraph is finished, meaning that the abstract values for each control 
point has converged to an inductive invariant for this subgraph, the next subgraph 
to work on has to be computed. 

This new subgraph contains all the paths from the previous one, and also new 
paths that become feasible regarding the current abstract values. The new paths 
in P are computed one after another, until no more path can make the invariant 
grow. This is line[TT]in Algorithm]!] which corresponds to Algorithm[2J We also use 
SMT solving to discover these new paths, but we subtly change the SMT formula 
given to the SMT solver: we now try to find a path that is not yet in P, but is 
feasible and makes the invariant candidate of its destination grow. We thus check 
the satisfiability of the formula f'{pi), where: 

f{p^) = f{p^) A 

Xj is updated using an abstract union when the point pj is the target of a new 
path. This way, further SMT queries do not compute other paths with the same 
source and destination if it is not needed (because these new paths would not make 
Xj grow, hence would not be returned by the SMT solver). 

When a new path has been found, it is immediately added into P. We then have 
to add Pi and pj into A (since we do not apply widening in this section) and pj into 
A' , since pj may be the starting point of a new feasible path. 
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Algorithm 2 ComputeNewPaths 

1: while true do 

2: res 4— SmtSolve [f'ipi)] 

3: if res = unsat then 

4: break 

5: end if 

6: Compute the path e from the model 

7: ^ Xj U Te{X,) 

8: P^PU{e} 

9: A^AU{p,} 

10: A' ^ A' U {pi} 

11: end while 



3.3. Termination. Termination of this algorithm is guaranteed, because: (1) the 
subset of paths P strictly increases at each loop iteration, and is bounded by the 
finite set of paths in the entire graph. (2) when computing new paths, we cunjunct 
our formula with ^P, meaning that we obtain each possible path only once. The 
number of path is finite, so this computation always terminates. (3) the Path- 
focusing iterations terminate because of the properties of widening. 



3.4. Example. We revise the rate limiter described in 12.31 In this example. Path- 
focusing works well because all the paths starting at the loop header are actually 
self loops. In such a case, the technique performs a widening/ narrowing sequence 
or accelerates the loop, thus leading to a precise invariant. However, in some cases, 
there also exists paths that are not self loops, in which case Path-focusing applies 
widening. This widening may induce unrecoverable loss of precision. 
Suppose the main loop of the rate limiter contains a nested loop like: 

1 void rate.limiter () { 

2 int x.old = 0; 

3 while (1) { 

4 int X = input(-100000, 100000) 

5 if (x > x_old+10) X = x.old+10 

6 if (x < x.old-10) X = x.old-10 

7 x.old = x; 

8 while (waitO) {} 
9} } 

We choose Pr as the set of loop headers of the function, plus the initial state. 
In this case, we have three elements in Pr. 

The main loop in the expanded multigraph has then 4 distinct paths going to 
the header of the nested loop. 

Guided static analysis from [l^ yields, at line 3, x.old G (— oo,+oo). Path- 
focusing [2^ also finds x.old e (—00,-1-00). Now, let us see how our technique 
performs on this example. 

Figure m shows the sequence of subset of paths during the analysis. The points 
in Pr are noted pi, where i is the corresponding line in the code: for instance, p^ 
corresponds to the header of the main loop. 

(1) The starting subgraph is depicted on Figure [2] Step 1. At the beginning, 
this graph has no transitions. 

(2) We compute the new feasible paths that have to be added into the subgraph. 
We first find the path from pi to ps and obtain at p^ x.old = 0. 
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Step 1 




Step 2 |P1 



X_Old -s- 



• 




-10000 < X < 10000 

x.old - 10 < X 
X < x.old + 10/ 
x.old X 



Figure 2. Ascending sequence of subgraphs 



The image of x.old = by the path that goes from to pg, and that 
goes through the else branch of each if-then-else, is — 10 < x.old < 10. This 
path is then added to our subgraph. 

Moreover, there is no other path starting at whose image is not in 
-10 < x_old < 10. 

Finally, since the abstract value associated to ps is —10 < x.old < 10, 
the path from ps to p^ is feasible and is added into P. The final subgraph 
is depicted on Figure [2] Step 2. 

(3) We then compute the ascending iterations by path-focusing. At the end of 
these iterations, we obtain — oo < x_old < +oo for both p^ and ps- 

(4) We now can apply narrowing iterations, and recover the precision lost by 
widening: we obtain —10000 < x_old < 10000 at points ps and ps- 

(5) Finally, we compute the next subgraph. The SMT-solver does not find any 
new path that makes the abstract values grow, and the algorithm termi- 
nates. 

Our technique gives us the expected invariant x_old S [—10000,10000]. Here, 
only 3 paths out of the 6 have been computed during the analysis. In practice, 
depending on the order the SMT-solver returns the paths, other feasible paths 
could have been added during the analysis. 

In this example, we see that our technique actually combines best of Guided 
Static Analysis and Path Focusing. 



While many (most?) useful program invariants on numerical variables can be 
expressed as conjunctions of inequalities and congruences, it is sometimes necessary 
to introduce disjunctions. For instance, the loop for (int i=0; i<n; i++) {...} has 
head invariant 0<i<nV(z = OAn<0). For this very simple example, a simple 
syntactic transformation of the control structure (into i=0; if (i<n)do {...} while ( 
i<n)) is sufficient, but in more complex cases more advanced analyses are necessary 
[S HH US i intuitive terms, they discover phases or modes in loops. 

Gulwani & Zuleger [l^ proposed a technique for computing disjunctive invari- 
ants, by distinguishing all the paths inside a loop. In this section, we propose to 
improve this technique by using SMT queries to find interesting paths, the objec- 
tive being to avoid an explicit exhaustive enumeration of an exponential number of 
paths. 

For each control point pi, we compute a disjunctive invariant Vi<j<m -^iJ- We 
denote by rii the number of distinct paths starting at pi. To perform the analysis, 
one chooses an integer Si G [l,TOi], and a mapping function ai : [l,mi\ x [l,rii] t-^ 
[1, rrii]. The k-th path starting iompi is denoted r,;_fc. The image of the j-th disjunct 



4. Disjunctive Invariants 



SUCCINCT REPRESENTATIONS FOR ABSTRACT INTERPRETATION 



11 



Xij by the path Ti^k is then joined with ^.q- j.). Initially, the Si-th abstract value 
contains the initial states of Pi, and all other abstract values contain 0. 

For each control point pi € Pr, m.i, Si and ai can be defined heuristically. For 
instance, one could define cr^ so that (Ji{j, k) only depends on the last transition of 
the path, or else construct it dynamically during the analysis. 

Our method improves this technique in two ways : 

• Instead of enumerating the whole set of paths, we keep them implicit and 
compute them only when needed. 

• At each loop iteration of the original algorithm [l^ , an image by each path 
inside the loop is computed for each disjunct of the invariant candidate. 
Yet, many of these images may be redundant: for instance, if our invariant 
candidate is (0 < a: < 10 AO < y < 1000) V (a; < -10 Ay < -10), then there 
is no point enumerating paths whose image is included in this invariant 
candidate. In our approach, we compute such an image only if it makes the 
resulting abstract value grow. 

Our improvement consists in a modification of the SMT formula we solve in 
[31 We introduce in this formula Boolean variables {dj, I < j < m}, so that we 
can easily find in the model which abstract value of the disjunction of the source 
point has to be chosen to make the invariant of the destination grow. The resulting 
formula that is given to the SMT solver is defined by g{pi). When the formula is 
satisfiable, we know that the index j of the starting disjunct that has to be chosen 
is the one for which the associate Boolean value dj is true in the model. Then, we 
can easily compute the value of (Ti{j, k), thus know the index of the disjunct to join 
with. 

g{p,)=pAbtA /\ ^b]A y {dkAX,,kA /\^di)A \/ {b-^ A /\ (-X„fc)) 

je-Pfl l<fc<mi l^k jeSucc{i) l<k<mi 

In our algorithm, the initialization of the abstract values slightly differs from 
algorithm [T] line [SJ since we now have to initialize each disjunct. Instead of Line [51 
we initialize Xi,k with _L for all k £ {1, ■■,mi} \ {6i}, and Xi^Si with <— Ip.. 

Furthermore, the Path-focused algorithm (line [T71 from algorithm[T]) is enhanced 
to deal with disjunctive invariants, and is detailed in algorithm [31 

The Update function can classically assign Xi^o-iQ-^fc) V(Xi U Tj^kiXj^j)) to 

^i.aiij.k), or can integrate the special treatment for self loops proposed by [28[, 
with widening/narrowing sequence or acceleration. 

Algorithm 3 Disjunctive invariant computation with implicit paths 

1: while true do 

2: res •<— SmtSolve [g{pi)] 

3: if res = unsat then 

4: break 

5: end if 

6: Compute the path Ti^k from res 
7: Take j e {l\di ^ true} 
8: Update(Xj^^^Q-^fc)) 

9: end while 



We experimented with a heuristic of dynamic construction of the <7i functions, 
adapted from [l6| . For each control point pi S Pr, we start with one single disjunct 
(nii = 1) and define Si — I. M denotes an upper bound on the number of disjuncts 
per control point. 

The ai functions take as parameters the index of the starting abstract value, 
and the path we focus on. Since we dynamically construct these functions during 
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the analysis, we store their already computed image into a compact representation, 
such as Algebraic Decision Diagrams. (Ti{j,k) is then constructed on the fly only 
when needed, and computed only once. When the value of ai{j, k) is required but 
undefined, we first compute the image of the abstract value Xi j by the path indexed 
by k, and try to find an existing disjunct of index j' so that the least upper bound 
of the two abstract values is exactly their union (using SMT-solving) . If such an 
index exists, then we set cri{j, k) = j' . Otherwise: 

• if nii < M , we increase mi by 1 and define cri(j, k) = rrii 

• if mi = M, we define a-i{j, k) = M 

The main difference with the original algorithm [l6| is that we construct Ui {j, k) 
using SMT queries instead of enumerating a possibly exponential number of paths 
to find a solution. 



5. Implementation and Experimental Comparisons 

We have implemented our proposed solutions inside a prototype of intraproce- 
dural static analyzer called PAGAI, as well as the classical abstract interpretation 
algorithrn, and the state-of-the-art techniques Path Focusing t28|] and Guided Static 



Analysis 15| . It is available online at https : //forge . imag . f r/pro j ects/pagai/[ 

The implementation is documented in [20i . 

PAGAI operates over LLVM bitcode [25j, i2J] , which is a target for several com- 
pilers, most notably Clang (supporting C and C-I--I-) and Uvm-gcc (supporting C, 
C-I--I-, Fortran and Ada). Abstract domains are provided by the APRON library 
[2^ . and include convex polyhedra (from the builtin Polka "PK" library), octagons, 
intervals, and linear congruences. For SMT-solving, our analyzer uses Yices |12| or 
Microsoft Z3 ll]. 

PAGAI currently neither models the memory heap nor performs interprocedural 
analysis. Instead, LLVM optimization phases are applied prior to analysis, in order 
to inline non-recursive function calls and lift certain memory accesses to opera- 
tions on expficit numerical variables (e.g. y=t[0]*t [0]; preceded by t[0]=x; without 
any aliased write in between is replaced by y=x*x;). The remaining memory reads 
are considered as indeterminates, and memory writes are ignored; this is a sound 
abstraction. 

We conducted extensive experiments on real-life programs in order to compare 
the different techniques, mostly on open-source projects (Fig.|3]) written in C, C-f-|- 
and Fortran. These results confirm that our combined technique improve the anal- 
ysis in comparison with the two techniques taken individually, at a reasonable cost. 
The extension with disjunctive invariants increases precision in many cases, but 
with higher cost in terms of execution time. 



6. Conclusion and Future Prospects 

Roughly, an analysis by abstract interpretation is defined by the choice of an 
iteration strategy and an abstract domain. In this article, we demonstrated that 
changes in the iteration algorithm can significantly improve precision, sometimes 
while improving analysis times. 

A common criticism of analysis techniques based on SMT-solving is that they do 
not scale up. Yet, our experiments show that, for numerical properties, they scale 
up to the size of typical functions and loops. It is however quite certain that, naively 
applied, they cannot scale to the kind of programs targeted by e.g. the Astree tool, 
that is, a dozens or hundreds of thousands of lines of code in a single loop operating 
over similar numbers of remanent variables. Actually, for such applications, only 
(quasi-)linear algorithms scale up, and "cheap" abstract domains such as octagons 
(O(n^) where n is the number of variables) are not applied to the full variable set. 
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but to restricted subsets thereof. It thus seems reasonable that techniques such as 
considering "packs" of related variables, slicing, etc. may similarly help SMT-based 
techniques to scale to global analyses. 





Size 


Execution time (seconds) 


Name 


kLOC 


\Pr\ 


S 


G 


PF 


G+PF 


DIS 


a2ps-4.14 


55 


2012 


23 


74 


34 


115 


162 


gawk-4.0.0 


59 


902 


15 


46 


12 


40 


50 


gnuchess-6.0.0 


38 


1222 


50 


220 


81 


312 


351 


gnugo-3.8 


83 


2801 


77 


159 


92 


766 


1493 


grep-2.9 


35 


820 


41 


85 


22 


65 


122 


gzip-1.4 


27 


494 


22 


268 


91 


303 


230 


lapack-3.3.1 


954 


16422 


294 


3740 


3773 


8159 


10351 


make- 3. 8 2 


34 


993 


67 


108 


53 


109 


257 


tar- 1.26 


73 


1712 


37 


218 


115 


253 


396 



Table 1 . Execution times for various techniques 




Figure 3. Comparison of the abstract values obtained on several 
open-source projects. The table shows their respective number of 
lines of code, number of control points in Pr, and execution time 
on various techniques. Techniques are classical abstract interpreta- 
tion (S), Guided Static Analysis (G), Path-focused technique (PF), 
our combined technique (G-I-PF), and its version with disjunctive 
invariants (DIS). The C bars (resp. 3) gives the perccintage of in- 
variants stronger (more precise; smaller with respect to inclusion) 
with the left-side (resp. right-side) technique, and "uncompara- 
ble" gives the percentage of invariants that are uncomparable, i.e 
neither greater nor smaller; the code points where both invariants 
are equal make up the remaining percentage. 

We compared the precision of different techniques and abstract domains by com- 
paring the invariants for the inclusion ordering. A better metric is perhaps to take 
a client analysis - such as the detection of overflows and array bound violations 
— and compare the rates of alarms. 

We focused on numerical properties, because they are supported by easily avail- 
able abstract libraries. Yet, in most programs, properties of data structures are 
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important for proving interesting properties. Further investigations are needed not 
only on good abstractions for pointers (many are already known) but also on their 
conversion to SMT problems. 
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